Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: avoid creating mutex and rwlock resource_span with parent, use None parent instead #6107

Merged
merged 3 commits into from
Nov 14, 2023

Conversation

xuorig
Copy link
Contributor

@xuorig xuorig commented Oct 24, 2023

Most of the context for this PR can be found in this issue.

The TRACE resource_span that Mutex and RWLock create when using new can keep open their parent for as long as the mutex lives. This pull request changes these spans to have a parent of None rather than having a parent-child relation ship with the current span.

As mentioned by @davidbarsky in the linked issue, another possibility could be to alter the API and accept a parent span. I'm open to changing this PR although I agree it feels strange when it comes to tracing.

Looking for existing tests, I wasn't sure if regression testing this (which would require bringing a tracing subscriber most likely) was necessary here. Open to adding this as well.

@github-actions github-actions bot added the R-loom-sync Run loom sync tests on this PR label Oct 24, 2023
@NobodyXu
Copy link
Contributor

OnceCell, Semaphore etc probably also have the same issue, IMO it's a good idea to also change that in this PR

@Darksonn Darksonn added A-tokio Area: The main tokio crate M-sync Module: tokio/sync M-tracing Tracing support in Tokio labels Oct 25, 2023
@Darksonn
Copy link
Contributor

Yeah, if possible I'd like to fix all instances of this.

I don't think we have many tests for the tracing stuff.

Open to adding this as well.

If you want to add tests, then those are always welcome.

Copy link
Member

@davidbarsky davidbarsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Alice and Jiahao–can you make changes to tokio::sync:OnceCell, tokio::sync::Semaphore, and tokio::sync::Barrier similar to my suggestions on tokio::sync::RwLock/tokio::sync::Mutex ?

tokio/src/sync/mutex.rs Show resolved Hide resolved
tokio/src/sync/rwlock.rs Show resolved Hide resolved
@davidbarsky
Copy link
Member

@xuorig On the subject of tests, I know that @hds was looking at adding tests for resources in Tokio using tracing-mock. What are your timelines for needing this to land? Would y'all be able to take a dependency on an unmerged PR to Tokio for a few days or nah?

@xuorig
Copy link
Contributor Author

xuorig commented Oct 25, 2023

@davidbarsky absolutely, this can wait. I've added a mock currently but happy to swap it out later 👍

@xuorig xuorig force-pushed the xuorig/mutex-resource-span branch 4 times, most recently from 5bd0a2f to 5a74dd0 Compare October 25, 2023 14:09
@xuorig
Copy link
Contributor Author

xuorig commented Oct 26, 2023

I just committed a test using tracing-mock before realizing this PR is introducing it more generally. Will wait for it to be merged and modify this PR.

@Darksonn
Copy link
Contributor

Darksonn commented Nov 5, 2023

The other PR is merged now.

Copy link
Contributor

@hds hds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some prototyping with tokio-console (the console-subscriber specifically) and discussions with @hawkw and @davidbarsky, I believe we should not add follows_from relationship to the resource span (and for the same reasons, I don't plan to add it to the task span either).

TLDR: I propose that this change should only add parent: None to these resource spans and make no other changes.

The reason for this is a bit long, but I felt it made sense to write it out so that the justification is available when people come looking at this change.

The reason is that using follows_from with the current span is a bit delicate. You're using the global current span, but specific tracing-subscriber Layers may not be recording that span (it could be filtered out) and this results in the layer never receiving a call to on_follows_from.

Here is an example (I'm illustrating based on task spans, but the same is the case for the sleep resource):

async fn spawn_parent() {
    tokio::task::Builder::new()
        .name("parent_ends")
        .spawn(parent_spawns_child())
        .unwrap();
}

#[tracing::instrument]
async fn parent_spawns_child() {
    tokio::task::Builder::new()
        .name("child_continues")
        .spawn(async {
            loop {
                tokio::time::sleep(Duration::from_secs(100)).await;
            }
        })
        .unwrap();

    tokio::time::sleep(Duration::from_secs(5)).await;
}

Because of the #[tracing::instrument] on parent_spawns_child(), the span hierarchy looks like this:

- runtime.spawn[task.name=parent_ends] (id=1)
  - parent_spawns_child (id=274877906945)
    - runtime.spawn[task.name=child_continues] (id=274877906946)

And that parent_spans_child span in the middle means that a Layer which is only looking for tokio instrumentation (i.e. console-subscriber) will never receive a call to on_follow_from. If I take the #[tracing::instrument] out, then it will.

This means that the console-subscriber (or any subscriber layer) won't be able to rely on follows_from unless it registers for all spans which are active. On the other side, the behavior which we proposed here (providing a "weak" connection to the context within which a resource or task is created) can be safely done within the subscriber layer itself, as the layer has the current span stack from its point of view (containing only spans that it is aware of) and in the example above, the console-subscriber would not see the parent_spans_child span and could itself keep a connection between the two spans that won't cause the parent span to remain open longer than it should.

@xuorig
Copy link
Contributor Author

xuorig commented Nov 10, 2023

PR updated:

  • Using parent None and no follows_from relation ship as suggest by @hds.
  • Since batch_semaphore is only used internally and also since it does seem to have a parent-child relationship with its callers, I've omitted parent: None from that resource span. Let me know if that makes sense to you all.
  • Tests added for all new functions of the sync primitives in sync.

…arent instead

A child span stored on sync primitives can keep the parent span open,
unable to be closed by subscribers due to the sync resource referencing it.

Fixes: tokio-rs#6106
Copy link
Member

@davidbarsky davidbarsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but i'll let @hds confirm that it's correct.

Copy link
Contributor

@hds hds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for your work on this change!

I will just note here that the tests (which are great and something very valuable to add), don't test the change, which is to say they're not testing that the resource spans are being created as explicit roots (parent: None). However, there's actually a defect in tracing-mock which means that you can't test for that right now anyway (tokio-rs/tracing#2440).

Once that defect gets fixed, these tests can be updated, but for now we have significantly more test coverage than before, so let's go ahead and merge this change!

@davidbarsky davidbarsky merged commit 7b55518 into tokio-rs:master Nov 14, 2023
71 checks passed
@xuorig xuorig deleted the xuorig/mutex-resource-span branch November 14, 2023 23:57
@xuorig
Copy link
Contributor Author

xuorig commented Nov 15, 2023

@hds one question about this, just for my own learnings:

don't test the change, which is to say they're not testing that the resource spans are being created as explicit roots

I was under the impression they did:

.new_span(mutex_span.clone().with_explicit_parent(None))

Now I understand there's a defect with the above, but are you saying this wasn't the way to do it in the first place?

@hds
Copy link
Contributor

hds commented Nov 15, 2023

@xuorig Your expectations are correct. But they're not actually testing what you want them to because there is no contextual parent span when the resource gets created.

To complete the test, we would need to do something like the following (for the Barrier test):

#[tokio::test]
async fn test_barrier_creates_span() {
    let barrier_span = expect::span()
        .named("runtime.resource")
        .with_target("tokio::sync::barrier");

    let size_event = expect::event()
        .with_target("runtime::resource::state_update")
        .with_fields(expect::field("size").with_value(&1u64));

    let arrived_event = expect::event()
        .with_target("runtime::resource::state_update")
        .with_fields(expect::field("arrived").with_value(&0));

    let contextual_parent = expect::span().named("contextual parent");

    let (subscriber, handle) = subscriber::mock()
        // We need to expect our contextual parent span is created and entered
        .new_span(contextual_parent.clone())
        .enter(contextual_parent)
        .new_span(barrier_span.clone().with_explicit_parent(None))
        .enter(barrier_span.clone())
        .event(size_event)
        .event(arrived_event)
        .exit(barrier_span.clone())
        .drop_span(barrier_span)
        .run_with_handle();

    {
        let _guard = tracing::subscriber::set_default(subscriber);
        {
            // We enter into a span so that the barrier will have a contextual parent.
            let _span = tracing::info_span!("contextual parent").entered();
            let _barrier = sync::Barrier::new(1);
        }
    }

    handle.assert_finished();
}

If you run this test on current master it will pass. But if you run it it without the changes in this PR, it will also pass. That's the defect (along with another thing) in tracing-mock that needs to be fixed.

hds added a commit that referenced this pull request Nov 18, 2023
In Tokio, tasks are optionally instrumented with tracing spans to allow
analysis of the runtime behavior to be performed with tools like
tokio-console.

The span that is created for each task gets currently follows the
default tracing behavior and has a contextual parent attached to it
based on the span that is actual when `tokio::spawn` or similar is
called.

However, in tracing, a span will remain "alive" until all its children
spans are closed. This doesn't match how spawned tasks work. A task may
outlive the context in which is was spawned (and frequently does). This
causes tasks which spawn other - longer living - tasks to appear in
`tokio-console` as having lost their waker when instead they should be
shown as completed (tokio-rs/console#345). It can also cause undesired
behavior for unrelated tracing spans if a subscriber is receiving both
the other spans as well as Tokio's instrumentation.

To fix this mismatch in behavior, the task span has `parent: None` set
on it, making it an explicit root - it has no parent. The same was
already done for all spans representing resources in #6107. This change
is made within the scope of #5792.

Due to a defect in the currently available `tracing-mock` crate, it is
not possible to test this change at a tracing level
(tokio-rs/tracing#2440). Instead, a test for the `console-subscriber`
has been written which shows that this change fixes the defect as
observed in `tokio-console` (tokio-rs/console#490).
hds added a commit that referenced this pull request Nov 19, 2023
In Tokio, tasks are optionally instrumented with tracing spans to allow
analysis of the runtime behavior to be performed with tools like
tokio-console.

The span that is created for each task gets currently follows the
default tracing behavior and has a contextual parent attached to it
based on the span that is actual when `tokio::spawn` or similar is
called.

However, in tracing, a span will remain "alive" until all its children
spans are closed. This doesn't match how spawned tasks work. A task may
outlive the context in which is was spawned (and frequently does). This
causes tasks which spawn other - longer living - tasks to appear in
`tokio-console` as having lost their waker when instead they should be
shown as completed (tokio-rs/console#345). It can also cause undesired
behavior for unrelated tracing spans if a subscriber is receiving both
the other spans as well as Tokio's instrumentation.

To fix this mismatch in behavior, the task span has `parent: None` set
on it, making it an explicit root - it has no parent. The same was
already done for all spans representing resources in #6107. This change
is made within the scope of #5792.

Due to a defect in the currently available `tracing-mock` crate, it is
not possible to test this change at a tracing level
(tokio-rs/tracing#2440). Instead, a test for the `console-subscriber`
has been written which shows that this change fixes the defect as
observed in `tokio-console` (tokio-rs/console#490).
kodiakhq bot pushed a commit to pdylanross/fatigue that referenced this pull request Dec 11, 2023
Bumps tokio from 1.34.0 to 1.35.0.

Release notes
Sourced from tokio's releases.

Tokio v1.35.0
1.35.0 (December 8th, 2023)
Added

net: add Apple watchOS support (#6176)

Changed

io: drop the Sized requirements from AsyncReadExt.read_buf (#6169)
runtime: make Runtime unwind safe (#6189)
runtime: reduce the lock contention in task spawn (#6001)
tokio: update nix dependency to 0.27.1 (#6190)

Fixed

chore: make --cfg docsrs work without net feature (#6166)
chore: use relaxed load for unsync_load on miri (#6179)
runtime: handle missing context on wake (#6148)
taskdump: fix taskdump cargo config example (#6150)
taskdump: skip notified tasks during taskdumps (#6194)
tracing: avoid creating resource spans with current parent, use a None parent instead (#6107)
tracing: make task span explicit root (#6158)

Documented

io: flush in AsyncWriteExt examples (#6149)
runtime: document fairness guarantees and current behavior (#6145)
task: document cancel safety of LocalSet::run_until (#6147)

#6001: tokio-rs/tokio#6001
#6107: tokio-rs/tokio#6107
#6144: tokio-rs/tokio#6144
#6145: tokio-rs/tokio#6145
#6147: tokio-rs/tokio#6147
#6148: tokio-rs/tokio#6148
#6149: tokio-rs/tokio#6149
#6150: tokio-rs/tokio#6150
#6158: tokio-rs/tokio#6158
#6166: tokio-rs/tokio#6166
#6169: tokio-rs/tokio#6169
#6176: tokio-rs/tokio#6176
#6179: tokio-rs/tokio#6179
#6189: tokio-rs/tokio#6189
#6190: tokio-rs/tokio#6190
#6194: tokio-rs/tokio#6194



Commits

92a3455 chore: prepare Tokio v1.35.0 (#6197)
1968565 chore: use relaxed load for unsync_load (#6203)
c9273f1 sync: improve safety comments for WakeList (#6200)
e05d0f8 changelog: fix missing link for 1.8.2 (#6199)
debcb22 Revert "net: add SocketAddr::as_abstract_namespace (#6144)" (#6198)
83b7397 io: drop the Sized requirements from AsyncReadExt.read_buf (#6169)
3991f9f docs: fix typo in 'tokio/src/sync/broadcast.rs' (#6182)
48c0e62 chore: use relaxed load for unsync_load on miri (#6179)
d561b58 taskdump: skip notified tasks during taskdumps (#6194)
3a4aef1 runtime: reduce the lock contention in task spawn (#6001)
Additional commits viewable in compare view




Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
hds added a commit that referenced this pull request Jul 26, 2024
When instrumenting resources in Tokio, a span is created for each
resource. Previously, all resources inherited the currently active span
as their parent (tracing default). However, this would keep that parent
span alive until the resource (and its span) were dropped. This is often
not correct, as a resource may be created in a task and then sent
elsewhere, while the originating task ends.

This artificial extension of the parent span's lifetime would make it
look like that task was still alive (but idle) in any system reading the
tracing instrumentation in Tokio, for example Tokio Console as reported
in tokio-rs/console#345.

In #6107, most of the existing resource spans were updated to
make them explicit roots, so they have no contextual parent. However,
2. were missed:
- `Sleep`
- `BatchSemaphore`

This change alters the resource spans for those 2 resources to also make
them explicit roots.

(so that the span doesn't have a parent). This
is necessary, because otherwise a resource that is created inside a task
(or some other span) and then sent elsewhere will keep that contextual
parent span open (but not necessarily active) for the lifetime of the
resource itself.
hds added a commit that referenced this pull request Jul 26, 2024
When instrumenting resources in Tokio, a span is created for each
resource. Previously, all resources inherited the currently active span
as their parent (tracing default). However, this would keep that parent
span alive until the resource (and its span) were dropped. This is often
not correct, as a resource may be created in a task and then sent
elsewhere, while the originating task ends.

This artificial extension of the parent span's lifetime would make it
look like that task was still alive (but idle) in any system reading the
tracing instrumentation in Tokio, for example Tokio Console as reported
in tokio-rs/console#345.

In #6107, most of the existing resource spans were updated to
make them explicit roots, so they have no contextual parent. However,
2. were missed:
- `Sleep`
- `BatchSemaphore`

This change alters the resource spans for those 2 resources to also make
them explicit roots.
hds added a commit that referenced this pull request Jul 29, 2024
…ts (#6727)

When instrumenting resources in Tokio, a span is created for each
resource. Previously, all resources inherited the currently active span
as their parent (tracing default). However, this would keep that parent
span alive until the resource (and its span) were dropped. This is often
not correct, as a resource may be created in a task and then sent
elsewhere, while the originating task ends.

This artificial extension of the parent span's lifetime would make it
look like that task was still alive (but idle) in any system reading the
tracing instrumentation in Tokio, for example Tokio Console as reported
in tokio-rs/console#345.

In #6107, most of the existing resource spans were updated to
make them explicit roots, so they have no contextual parent. However,
2. were missed:
- `Sleep`
- `BatchSemaphore`

This change alters the resource spans for those 2 resources to also make
them explicit roots.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-sync Module: tokio/sync M-tracing Tracing support in Tokio R-loom-sync Run loom sync tests on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants